Relationship between temperature and load

Read in the TS objects.We set the index as the dates and remove the time/date column and the X column. The temperature file has 11 zones and I assume that those 11 zones are the same as the first zones in the load TS.

Because we have so many zones, we might want to simplify our data and only use the sum over all the zones. If we do not do that, the temperature vs weight plot will have different clusters and the TS plot will also have more lines. A simple mean should be ok due to them having the same units.

## [1] 39576

The figure shows some relations shipt between temperature and load but as temperature increases/decreases from the mean (around 56 °F), the energy load increases.

We make time series obect with zoo and ts. The zoo object have the time but the ts object only has the order. Load_zoo_zones includes all the zones but the load_zoo includes the sum of all of the loads at a time unit.

## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

One year

Here we will test 2004 and see how each zone behaves throughout the year. Because the first figure shows that there is a lot of difference in the magintude of load between zones we standardize it and plot it again. The standardized plot is much nicer. There are some white lines in the plt (zone 4) which are outliers in the zone (values close to 0). What to do about this? similar thing was in zone 9 and was removed.

## [1] "2004-01-01 01:00:00 CET"

Compare Years

To compare years, we extract each year out of the data with each measurement sum of all load. We then plot imageplot and lineplot to compare the years. Because during the day, the measurements change more than throughout the months, we try to use smoothening by taking average load over each 24 hours. We then do the same for weeks and months.

I am here, start by making plots beautiful

#sum everything and compare years


smoothpars <- c(0,24,24*7, 24*30) #0 fyrir klukkustund, 24 fyrir dag, 24*7 fyrir vikur


size_year = c(0,0,0,0,0)
for (i in (1:(length(years)))){
  year = years[i]
  temp_start <-  paste(c(year, "-01-", "01 01:00:00"), collapse = "")
  temp_end <-  paste(c(year, "-12-", "31 24:00:00"), collapse = "")

  if (i == 5){yend = length(load_zoo)} else {yend <- which(as.POSIXct(temp_end) == index(load_zoo))-1} #-1 til þess að losna við firsta value á næsta ári
  ystart <- which(as.POSIXct(temp_start) == index(load_zoo))
  
  size_year[i] <- length(load_zoo[ystart:yend])
}

counter <- 1
labels <- c("Hours", "Days", "Weeks", "Months")
titles <- c("At each hour", "Average load over the day", "Average load over the week", "Average load over the months")
for (s in smoothpars){
  
  year_zoo <- matrix(data = NA, nrow = 5, ncol = max(size_year))
  
  for (i in (1:(length(years)))){
    year = years[i]
    temp_start <-  paste(c(year, "-01-", "01 01:00:00"), collapse = "")
    temp_end <-  paste(c(year, "-12-", "31 24:00:00"), collapse = "")
    as.POSIXct(temp_start)
    if (i == 5){yend = length(load_zoo)} else {yend <- (which(as.POSIXct(temp_end) == index(load_zoo))-1)} #-1 til þess að losna við firsta value á næsta ári
    ystart <- which(as.POSIXct(temp_start) == index(load_zoo))
    
    temp <-  coredata(load_zoo[ystart:yend])
    year_zoo[i,1:length(temp)] <- temp
  }
  year_zoo <- as.matrix(coredata(year_zoo))
  
  
  
  if (s > 0){
    
    ##Average accross 24 horys
    year_zoo_smooth <- matrix(data = NA, nrow = 5, ncol = ceiling(max(size_year)/s))
    idx1 <- 1
    idx2 <- 1
    while (idx2 <= dim(year_zoo)[2]-(s-1)){
      temp <- year_zoo[,(idx2:(idx2+(s-1)))]
      temp <- rowMeans(temp)
      year_zoo_smooth[,idx1] <- coredata(temp)
      idx2 <- idx2+s
      idx1 <- idx1+1
    }
    year_zoo <- year_zoo_smooth
}



p1 <- image(x = 1:dim(year_zoo)[1], 
      y = 1:dim(year_zoo)[2], 
      z = year_zoo, 
      xlab = "The first year [hours]", ylab = "Zones", main = "'Heatmap' of zones for the first year")




year_zoo <- t(year_zoo)
colnames(year_zoo) <- c("year1", "year2", "year3", "year4", "year5")
year_zoo <- as.data.frame(year_zoo)


p2 <- ggplot(data=year_zoo, aes(x=index(year_zoo))) + 
                geom_line(aes(y=year1, color = "2004")) + 
                geom_line(aes(y=year2, color = "2005")) + 
                geom_line(aes(y=year3, color = "2006")) + 
                geom_line(aes(y=year4, color = "2007")) + 
                geom_line(aes(y=year5, color = "2008")) +
                scale_colour_manual("", 
                      breaks = c("2004", "2005", "2006", "2007", "2008"),
                      values = c("2004"="yellow", "2005"="red", "2006"="green", "2007"="blue", "2008"= "black"))+
                xlab(labels[counter]) +
                ylab('Load')+ 
                labs(title = "Load over the time", subtitle = titles[counter])+
                scale_fill_discrete(name="Year")

p1
print(p2)
counter <- counter + 1
}

## Warning: Removed 24 rows containing missing values (geom_path).

## Warning: Removed 24 rows containing missing values (geom_path).

## Warning: Removed 24 rows containing missing values (geom_path).
## Warning: Removed 4415 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_path).

## Warning: Removed 2 rows containing missing values (geom_path).

## Warning: Removed 2 rows containing missing values (geom_path).
## Warning: Removed 185 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 28 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 7 rows containing missing values (geom_path).

Forecast

Lagplot

Here, we investigate how values in the time series are correlated to one another. We do this for different amouts of lags to wether the data is seasonal. In following figure we can see that at lag 1, the autocorrelation is highest (not far from 1). When lags are increased the correlation between the variables decreases untill around 22-24, where the datas correlation increases a little. In the ACF plot this can be more clearly seen.

Autocorrelations

The autocorrelation decreases from lag 1 but then around 15, it starts increasing, reacking a peak at 24. This behaviour repeats itself every 24 hours, decreasing a little every time. However, there are some intervals where it increases sligly (see bottom figure). Where the ACF increases slightly is the 6th and the 6th and the 7th lag. From this we have some information that there is a autocorrelation for lags 24, 48,72,… and there is also some autocorrelation increase for lags \[6*24\] and \[7*24\], meaning that the same weekdays have are more correlated than different ones.

Partial autocorrelation

To get the direct effect, we performed partial autocorrelation. Here it is easier to see what is going on. The first figure shows how autocorrelation drops, barely correlation around lag 10 but then goes up and is positively correlated for lags 15-16. The correlation is however greatest at lag 24 (nagative correlation). This is consistent with what we saw earlier.

In the next figure, the lags are extended to 200. It is clear that the PACF is the greatest at lag 24. However, every 24 lags it decreases up untill lags \[6*24\] and \[7*24\] (also onsistent with what we saw earlier).

Extending lags more, weekly correlation decreases.

There seems to be a seasonal period of 24 hours. Also, when lags are increased there seem to be another seasonal period of 7 days. Now to partial autocorrelation.

There is a significant partial autocorrelation on lag 1, 2, 3, 13:17, 24,25, and 26. When the lag is higher, there seems to be partial autocorrelation every 24 hours and every 7 days, confirming a seasonal period of 24 hours as well as a weekly seasonal period. HOW TO CHECK FOR MORE SEASONS, YEAR OR MONTHS??

To compare with, we plot the first day, week, and month to see how the timeseries behaves.